Search CORE

571 research outputs found

Improving Term Extraction with Terminological Resources

Author: C.G. Chute
T.G.O. Consortium
T.G.O. Consortium
Y. Tsuruoka
Publication venue
Publication date: 01/01/2006
Field of study

Studies of different term extractors on a corpus of the biomedical domain revealed decreasing performances when applied to highly technical texts. The difficulty or impossibility of customising them to new domains is an additional limitation. In this paper, we propose to use external terminologies to influence generic linguistic data in order to augment the quality of the extraction. The tool we implemented exploits testified terms at different steps of the process: chunking, parsing and extraction of term candidates. Experiments reported here show that, using this method, more term candidates can be acquired with a higher level of reliability. We further describe the extraction process involving endogenous disambiguation implemented in the term extractor YaTeA

arXiv.org e-Print Archive

Crossref

HAL-Paris 13

THE SEGMENTATION OF A TEXT LINE FOR A HANDWRITTEN UNCONSTRAINED DOCUMENT USING THINING ALGORITHM

Author: Adachi Y.
Tsuruoka S.
Yoshikawa T.
Publication venue: s.n.
Publication date: 01/01/2004
Field of study

ARTS repository - University of Groningen

THE SEGMENTATION OF A TEXT LINE FOR A HANDWRITTEN UNCONSTRAINED DOCUMENT USING THINING ALGORITHM

Author: Adachi Y.
Tsuruoka S.
Yoshikawa T.
Publication venue: s.n.
Publication date: 01/01/2004
Field of study

Dissertations of the University of Groningen

Analysis of gene ranking algorithms with extraction of relevant biomedical concepts from Pubmed publications

Author: Ananiadou S
Kim J-D
Kocbek S
Kokol P
Pernek I
Saetre R
Stiglic G
Tsujii J
Tsuruoka Y
Publication venue
Publication date: 01/01/2011
Field of study

The University of Manchester - Institutional Repository

Synthesis and structural characterization of thin multi-walled carbon nanotubes with a partially facetted cross section by a floating reactant method

Author: Endo M
Hayashi T
Kaburagi Y
Kim YA
Osato K
Shan J
Tsukada T
Tsuruoka S
Publication venue: 'Elsevier BV'
Publication date: 01/09/2005
Field of study

ArticleCarbon. 43(11):2243-2250 (2005)journal articl

Shinshu University Institutional Repository

Light emission spectra of AlGaAs/GaAs multiquantum wells induced by scanning tunneling microscope

Author: Ohizumi Y.
Ohno H.
Ohno Y.
Tsuruoka T.
Ushioda S.
Publication venue: 'AIP Publishing'
Publication date: 25/01/2012
Field of study

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Institutional Repositories DataBase (IRDB)

Themes in biomedical natural language processing: BioNLP08

Author: K Verspoor
A Airola
A Roberts
P Corbett
Y Sasaki
X Wang
M Stevenson
Y Tsuruoka
V Vincze
H Kilicoglu
A Neveol
Publication venue: BioMed Central
Publication date: 01/01/1991
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Edinburgh Research Explorer

Institute of Mathematics AS CR, v. v. i.

Normalizing biomedical terms by minimizing ambiguity and variability

Author: AA Morgan
B Settles
BL Humphreys
C Blaschke
D Hanisch
E Brill
G Navarro
G Ngai
G Zhou
H Fang
H Liu
H Liu
JD Kim
JD Wren
John McNaught
K Samuel
KB Cohen
L Hirschman
L Tanabe
L Tanabe
L Yeganova
M Krauthammer
MJ Schuemie
S Kulick
Sophia Ananiadou
The UniProt Consortium
WW Cohen
Y Tsuruoka
Y Tsuruoka
Yoshimasa Tsuruoka
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential variability of the terms. Soft string matching is a possible solution to the problem, but its inherent heavy computational cost discourages its use when the dictionaries are large or when real time processing is required. A less computationally demanding approach is to normalize the terms by using heuristic rules, which enables us to look up a dictionary in a constant time regardless of its size. The development of good heuristic rules, however, requires extensive knowledge of the terminology in question and thus is the bottleneck of the normalization approach. Results We present a novel framework for discovering a list of normalization rules from a dictionary in a fully automated manner. The rules are discovered in such a way that they minimize the ambiguity and variability of the terms in the dictionary. We evaluated our algorithm using two large dictionaries: a human gene/protein name dictionary built from BioThesaurus and a disease name dictionary built from UMLS. Conclusions The experimental results showed that automatically discovered rules can perform comparably to carefully crafted heuristic rules in term mapping tasks, and the computational overhead of rule application is small enough that a very fast implementation is possible. This work will help improve the performance of term-concept mapping tasks in biomedical information extraction especially when good normalization heuristics for the target terminology are not fully known.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

Predictability study on the aftershock sequence following the 2011 Tohoku-Oki, Japan, earthquake: first results

Author: Falcone G.
Hirata N.
Ishigaki Y.
Jordan T. H.
Kasahara K.
Nanjo K. Z.
Obara K.
Ogata Y.
Schorlemmer D.
Shiomi K.
Tsuruoka H.
Yokoi S.
Zhuang J.
Publication venue: 'Wiley'
Publication date: 01/11/2012
Field of study

Although no deterministic and reliable earthquake precursor is known to date, we are steadily gaining insight into probabilistic forecasting that draws on space–time characteristics of earthquake clustering. Clustering-based models aiming to forecast earthquakes within the next 24 hours are under test in the global project ‘Collaboratory for the Study of Earthquake Predictability’ (CSEP). The 2011 March 11 magnitude 9.0 Tohoku-Oki earthquake in Japan provides a unique opportunity to test the existing 1-day CSEP models against its unprecedentedly active aftershock sequence. The original CSEP experiment performs tests after the catalogue is finalized to avoid bias due to poor data quality. However, this study differs from this tradition and uses the preliminary catalogue revised and updated by the Japan Meteorological Agency (JMA), which is often incomplete but is immediately available. This study is intended as a first step towards operability-oriented earthquake forecasting in Japan. Encouragingly, at least one model passed the test in most combinations of the target day and the testing method, although the models could not take account of the megaquake in advance and the catalogue used for forecast generation was incomplete. However, it can also be seen that all models have only limited forecasting power for the period immediately after the quake. Our conclusion does not change when the preliminary JMAcatalogue is replaced by the finalized one, implying that the models perform stably over the catalogue replacement and are applicable to operational earthquake forecasting. However, we emphasize the need of further research on model improvement to assure the reliability of forecasts for the days immediately after the main quake. Seismicity is expected to remain high in all parts of Japan over the coming years. Our results present a way to answer the urgent need to promote research on time-dependent earthquake predictability to prepare for subsequent large earthquakes in the near future in Japan.Published653-6583.1. Fisica dei terremotiJCR Journalrestricte

Variation of radiocesium concentrations in cedar pollen in the Okutama area since the Fukushima Daiichi Nuclear Power Plant Accident

Author: Fukushi M
Hamada M
Inoue K
Sakano Y
Shimizu H
Tsuruoka H
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/05/2015
Field of study

Due to releases of radionuclides in the Fukushima Daiichi Nuclear Power Plant Accident, radiocesium (¹³⁴Cs and ¹³⁷Cs) has been incorporated into large varieties of plant species and soil types. There is a possibility that radiocesium taken into plants is being diffused by pollen. Radiocesium concentrations in cedar pollen have been measured in Ome City, located in the Okutama area of metropolitan Tokyo, for the past 3 years. In this research, the variation of radiocesium concentrations was analysed by comparing data from 2011 to 2014. Air dose rates at 1 m above the ground surface in Ome City from 2011 to 2014 showed no significant difference. Concentration of ¹³⁷Cs contained in the cedar pollen in 2012 was about half that in 2011. Between 2012 and 2014, the concentration decreased by approximately one fifth, which was similar to the result of a press release distributed by the Japanese Ministry of Agriculture, Forestry and Fisheries

Tokyo Metropolitan University Institutional Repository Miyako-Dori / 首都大学東京機関リポジトリ